test(whaleflow): replay dogfood workflow from recorded trace#2852
Conversation
There was a problem hiding this comment.
Hmbown has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
There was a problem hiding this comment.
Code Review
This pull request adds replay test coverage for the rlm_cache_change.star dogfood workflow, verifying successful replay from recorded mock traces and proper handling of missing records (resulting in ReplayDiverged). The changes include new tests and helper functions for trace reconstruction in crates/whaleflow/src/starlark_authoring.rs, alongside corresponding documentation and changelog updates. Feedback on the code changes highlights a potential simplification using .flatten() instead of .cloned().unwrap_or(None), and identifies a critical limitation in the trace reconstruction helper collect_leaf_records which statically traverses the AST and will fail to correctly reconstruct traces for workflows containing loops.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| .map(|dependency| { | ||
| ( | ||
| dependency.clone(), | ||
| resolved_outputs.get(dependency).cloned().unwrap_or(None), |
| WorkflowNode::Leaf(leaf) => { | ||
| let result = results | ||
| .iter() | ||
| .find(|result| result.leaf_id == leaf.id) | ||
| .expect("mock execution should record every declared leaf") | ||
| .clone(); |
There was a problem hiding this comment.
Limitation in Trace Reconstruction for Loops / Multiple Executions
The collect_leaf_records helper statically traverses the workflow AST (&workflow.nodes) to reconstruct the replay trace.
Because it performs a static traversal:
- It will only visit each
Leafnode once, even if that leaf is executed multiple times (e.g., inside aLoopUntilblock with multiple iterations). - The
.find()call on line 590 will always retrieve the first execution result of that leaf, ignoring subsequent iterations.
This means any workflow containing loops that execute more than once will produce an incomplete or incorrect replay trace, leading to ReplayDiverged errors during replay. Consider refactoring this helper to map directly over the dynamic execution results (execution.leaf_results) and resolve their dependencies dynamically, or document this limitation if it is strictly intended for single-iteration test scenarios.
Summary
workflows/rlm_cache_change.stardogfood workflowregression-tests,teacher-review, andsummarize-cache-changeregression-testsrecord producesReplayDivergedinstead of falling back to live executionworkflow_run, provider calls, TraceStore writes, worktree application, and TUI pod monitor behavior deferredRefs #2726 and #2679. Preserves and credits the WhaleFlow direction from #2482/#2486; thanks @AdityaVG13 for the original WhaleFlow draft and cost-tracking foundation.
Verification
cargo test -p codewhale-whaleflow rlm_cache_change --lockedcargo fmt --all --checkgit diff --checkcmp -s CHANGELOG.md crates/tui/CHANGELOG.md./scripts/release/check-versions.sh./scripts/release/check-ohos-deps.sh